Basic Usage
This tutorial walks through the complete xcode_debarcode workflow using a real dataset.
Prerequisites
CyTOF data file (FCS or H5AD)
Channel mapping CSV file
Example Dataset
Item |
Value |
|---|---|
Data file |
|
Mapping file |
|
Configuration |
2-block barcode (18 channels) |
Workflow Overview
Load Data -> Map Channels -> Transform -> Filter Intensity -> Debarcode
-> Inspect Distribution -> Hamming Clustering -> Confidence Filtering -> (Optional) Cohesion -> Save
1. Load and Map Data
Load Data
import warnings
warnings.filterwarnings('ignore')
import xcode_debarcode as xd
import plotly.io as pio
pio.renderers.default = 'notebook'
DATA_DIR = '../../data'
adata = xd.io.read_data(f'{DATA_DIR}/15K_test.fcs')
print(f"Loaded {adata.n_obs} cells, {adata.n_vars} channels")
Loaded 33168 cells, 42 channels
Supported formats
.fcs: Flow Cytometry Standard (requiresreadfcspackage).h5ad: AnnData HDF5 format (preserves all metadata)
Map Barcode Channels
Channel mapping identifies which CyTOF channels correspond to barcode sequences and renames them to a standardized format (s_1, s_2, …).
Mapping CSV structure:
bc_sequence |
channel_name |
|---|---|
1 |
163Dy_1 |
2 |
158Gd_2 |
… |
… |
adata = xd.io.map_channels(adata, f'{DATA_DIR}/barcode_channel_mapping_18ch.csv')
Channel mapping complete
Barcode channels: 18 (stored in adata.uns['barcode_channels'])
Other channels: 24
Total channels: 42
2. Transform Data
Apply log or arcsinh transformation to stabilize variance and normalize intensities.
Method |
Formula |
Recommended for |
|---|---|---|
|
log(x + 1) |
All methods (recommended) |
|
arcsinh(x / cofactor) |
PreMessa |
adata = xd.preprocessing.transform(adata, method='log')
Transformation complete
Method: log
Layer: 'log'
Transformed: 42 channels x 33168 cells
Barcode channels: 18
Transformed data is stored in adata.layers['log'].
Visualize Channel Distributions
fig = xd.plots.plot_channel_intensities(adata, layer='log')
fig
Bimodal distributions indicate good separation between ON/OFF states. Since our channels show clear bimodality, we will use the PC-GMM method (recommended default).
3. Intensity Filtering
Remove debris (low intensity) and doublets (high intensity) before debarcoding.
Visualize Cell Distribution
fig = xd.plots.plot_intensity_scatter(adata, layer='log')
fig
Preview Filter Boundaries
fig = xd.plots.plot_intensity_scatter(adata, layer='log', method='ellipsoidal', percentile=95.0)
fig
Apply Filter
adata = xd.preprocessing.filter_cells_intensity(
adata,
layer='log',
method='ellipsoidal',
percentile=95.0,
filter_or_flag='filter'
)
print(f"Remaining cells: {adata.n_obs}")
Intensity filtering: method=ellipsoidal, layer=log
Channel sum: min=5.65, max=128.49, mean=76.60
Channel var: min=0.1012, max=8.87, mean=2.26
Using robust Mahalanobis distance (percentile=95.0)
Mahalanobis threshold: 3.34
Center: sum=78.76, var=2.2240
Pass: 31509/33168 cells (95.0%)
[+] Filtered to 31509 cells
[+] Metadata saved: adata.uns['intensity_filtering']['intensity']
Remaining cells: 31509
4. Debarcoding
Choose a Method
Method |
Best for |
Notes |
|---|---|---|
PC-GMM |
Bimodal regime |
Recommended default; best coverage-accuracy trade-off |
PreMessa |
Unimodal regime |
For channels lacking bimodality |
GMM |
Exploration |
Assignments not constrained to valid patterns |
Apply Debarcoding
adata = xd.debarcode.debarcode(
adata,
method='pc_gmm',
layer='log'
)
PC-GMM: Processing 18 channels, 31509 cells (n_init=5)
PC-GMM: Complete. Mean confidence: 0.7987
Debarcoding complete using pc_gmm (saved as 'pc_gmm')
Assignment: adata.obs['pc_gmm_assignment']
Confidence: adata.obs['pc_gmm_confidence'] (raw score)
[+] Metadata saved: adata.uns['debarcoding']['pc_gmm']
After debarcoding:
adata.obs['pc_gmm_assignment']: Assigned barcode patternsadata.obs['pc_gmm_confidence']: Raw method confidence score (product of per-block posteriors for PC-GMM)
The confidence score is the native score produced by the debarcoding method and is used for coverage-accuracy trade-off analysis by ranking cells and selecting the top N%.
5. Inspect Barcode Distribution
Barcode Rank Histogram
fig = xd.plots.plot_barcode_rank_histogram(adata, assignment_col='pc_gmm_assignment')
fig
True barcodes typically have high cell counts, while noisy patterns have low counts.
Cumulative Barcode Rank
fig = xd.plots.plot_cumul_barcode_rank(adata, assignment_col='pc_gmm_assignment')
fig
A strong elbow suggests clear separation between real barcodes (high plateau) and junk patterns (tail).
6. Hamming Clustering
Hamming clustering corrects noisy assignments by merging small patterns into nearby valid patterns.
Why Hamming Clustering?
True barcodes have high cell counts
Noisy assignments create low-count patterns near true barcodes
If barcode sublibrary << full library, collisions are rare
See Hamming Guidelines for parameter tuning.
Apply Hamming Clustering
adata = xd.postprocessing.hamming_cluster(
adata,
assignment_col='pc_gmm_assignment',
confidence_col='pc_gmm_confidence',
method='msg',
radius=2,
ratio=25.0,
tie_break='lda',
layer='log'
)
Hamming clustering: method=msg, radius=2, ratio=25.0, ratio_metric=count, tie_break=lda
Eligible: all cells (31509)
Remapped: 893/31509 cells (2.8%)
Ties encountered: 144, LDA applied: 144
[+] Clustering saved in adata.uns['hamming_clustering']['pc_gmm_assignment']
[+] Results stored in:
- adata.obs['pc_gmm_hamming_assignment']
- adata.obs['pc_gmm_hamming_remapped']
After Hamming clustering:
adata.obs['pc_gmm_hamming_assignment']: Corrected assignmentsadata.obs['pc_gmm_hamming_remapped']: Boolean mask of remapped cells
Note: Confidence values are not modified by Hamming clustering. Continue to use pc_gmm_confidence for filtering.
Barcode Rank After Clustering
fig = xd.plots.plot_barcode_rank_histogram(adata, assignment_col='pc_gmm_hamming_assignment')
fig
7. Confidence Filtering
Filter cells based on their confidence scores to select high-quality assignments:
adata = xd.postprocessing.filter_cells_conf(
adata,
confidence_col='pc_gmm_confidence',
method='percentile',
value=90,
filter_or_flag='flag'
)
Percentile threshold (keep top 90%): 0.3169
Pass: 28358/31509 cells (90.0%)
[+] Added flag column: adata.obs['pc_gmm_pass']
[+] Metadata saved: adata.uns['confidence_filtering']['pc_gmm_pass']
This uses the raw confidence score from debarcoding to keep the top 90% of cells.
8. Cohesion (Optional)
Cohesion measures how close each cell is to its assigned cluster centroid. This is an optional end-of-workflow QC step to identify outlier cells within clusters.
adata = xd.postprocessing.add_cohesion(
adata,
assignment_col='pc_gmm_hamming_assignment',
layer='log',
min_cells=5
)
After cohesion computation:
adata.obs['pc_gmm_hamming_cohesion']: Cohesion score in [0, 1]1.0 = at centroid, lower = farther from centroid
0.0 = unscored (cluster has < min_cells)
Use cohesion for optional post-hoc filtering of cluster outliers:
# Example: filter cells with low cohesion
adata_filtered = adata[adata.obs['pc_gmm_hamming_cohesion'] > 0.3]
9. Save Results
xd.io.write_data(adata, 'debarcoded_15K.h5ad')
The H5AD file preserves:
All channels (barcode + phenotypic markers)
All layers (raw, transformed)
All metadata (assignments, confidence, filtering flags)
Accessing results
After debarcoding, key results are in:
# Assignments and confidence
adata.obs['pc_gmm_assignment'] # Barcode pattern strings
adata.obs['pc_gmm_confidence'] # Raw method confidence
# After Hamming clustering
adata.obs['pc_gmm_hamming_assignment'] # Corrected assignments
adata.obs['pc_gmm_hamming_remapped'] # Was cell remapped?
# Filtering flags
adata.obs['intensity_pass'] # Intensity filter
adata.obs['pc_gmm_pass'] # Confidence filter
# Optional cohesion (if computed)
adata.obs['pc_gmm_hamming_cohesion'] # Cluster cohesion
# Method parameters
adata.uns['debarcoding']['pc_gmm'] # Debarcoding parameters
adata.uns['hamming_clustering'] # Hamming parameters
xd.io.write_data(adata, 'debarcoded_15K.h5ad')
The H5AD file preserves:
All channels (barcode + phenotypic markers)
All layers (raw, transformed)
All metadata (assignments, confidence, filtering flags)
Accessing results
After debarcoding, key results are in:
# Assignments and confidence
adata.obs['pc_gmm_assignment'] # Barcode pattern strings
adata.obs['pc_gmm_confidence'] # Raw method confidence
# After Hamming clustering
adata.obs['pc_gmm_hamming_assignment'] # Corrected assignments
adata.obs['pc_gmm_hamming_remapped'] # Was cell remapped?
# Optional cohesion (if computed)
adata.obs['pc_gmm_hamming_cohesion'] # Cluster cohesion
# Filtering flags
adata.obs['intensity_pass'] # Intensity filter
adata.obs['pc_gmm_pass'] # Confidence filter
# Method parameters
adata.uns['debarcoding']['pc_gmm'] # Debarcoding parameters
adata.uns['hamming_clustering'] # Hamming parameters
Alternative: Pipeline Function
For standard workflows, use the integrated pipeline:
adata = xd.io.read_data(f'{DATA_DIR}/15K_test.fcs')
adata = xd.io.map_channels(adata, f'{DATA_DIR}/barcode_channel_mapping_18ch.csv')
adata = xd.debarcode.debarcoding_pipeline(
adata,
method='pc_gmm',
transform_method='log',
apply_intensity_filter=True,
intensity_method='ellipsoidal',
apply_hamming=True,
hamming_ratio=25.0,
apply_cohesion=False,
apply_confidence_filter=True,
confidence_filter_method='percentile',
confidence_value=90
)
Summary
Step |
Function |
Key outputs |
|---|---|---|
Load |
|
|
Map |
|
|
Transform |
|
|
Intensity filter |
|
|
Debarcode |
|
|
Hamming |
|
|
Confidence filter |
|
|
Cohesion (optional) |
|
|
Save |
|
|
Next Steps
Method Comparison: detailed comparison of debarcoding methods
Hamming Guidelines: parameter tuning for Hamming clustering
Simulation: generate synthetic data for testing